return value
QiMeng-MuPa: Mutual-Supervised Learning for Sequential-to-Parallel Code Translation
Ke, Changxin, Zhang, Rui, Wang, Shuo, Ding, Li, Li, Guangli, Wen, Yuanbo, Zhang, Shuoming, Xu, Ruiyuan, Qin, Jin, Guo, Jiaming, Wang, Chenxi, Li, Ling, Guo, Qi, Chen, Yunji
The rise of GPU-based high-performance computing (HPC) has driven the widespread adoption of parallel programming models such as CUDA. Yet, the inherent complexity of parallel programming creates a demand for the automated sequential-to-parallel approaches. However, data scarcity poses a significant challenge for machine learning-based sequential-to-parallel code translation. Although recent back-translation methods show promise, they still fail to ensure functional equivalence in the translated code. In this paper, we propose \textbf{QiMeng-MuPa}, a novel \textbf{Mu}tual-Supervised Learning framework for Sequential-to-\textbf{Pa}rallel code translation, to address the functional equivalence issue. QiMeng-MuPa consists of two models, a Translator and a Tester. Through an iterative loop consisting of Co-verify and Co-evolve steps, the Translator and the Tester mutually generate data for each other and improve collectively. The Tester generates unit tests to verify and filter functionally equivalent translated code, thereby evolving the Translator, while the Translator generates translated code as augmented input to evolve the Tester. Experimental results demonstrate that QiMeng-MuPa significantly enhances the performance of the base models: when applied to Qwen2.5-Coder, it not only improves Pass@1 by up to 28.91% and boosts Tester performance by 68.90%, but also outperforms the previous state-of-the-art method CodeRosetta by 1.56 and 6.92 in BLEU and CodeBLEU scores, while achieving performance comparable to DeepSeek-R1 and GPT-4.1. Our code is available at https://github.com/kcxain/mupa.
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > Germany > Berlin (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
Pre-training Limited Memory Language Models with Internal and External Knowledge
Zhao, Linxi, Zalouk, Sofian, Belardi, Christian K., Lovelace, Justin, Zhou, Jin Peng, Noonan, Ryan Thomas, Go, Dongyoung, Weinberger, Kilian Q., Artzi, Yoav, Sun, Jennifer J.
Neural language models are black-boxes--both linguistic patterns and factual knowledge are distributed across billions of opaque parameters. This entangled encoding makes it difficult to reliably inspect, verify, or update specific facts. We introduce Limited Memory Language Models (LMLM), a new class of language models that externalizes factual knowledge to external database during pre-training rather than memorizing them. Our pre-training approach strategically masks externally retrieved factual values from the training loss, thereby teaching the model to perform targeted lookups rather than relying on memorization in model weights. Our experiments demonstrate that LMLMs achieve competitive performance compared to significantly larger LLMs on standard benchmarks, while offering the advantages of explicit, editable, and verifiable knowledge bases.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Kagoshima Prefecture > Kagoshima (0.04)
- South America > Uruguay (0.04)
- (14 more...)
- Research Report > New Finding (1.00)
- Personal (1.00)
- Media (0.93)
- Leisure & Entertainment > Sports > Soccer (0.68)
- Government > Military (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Mockingbird: How does LLM perform in general machine learning tasks?
Jia, Haoyu, Obinata, Yoshiki, Kawaharazuka, Kento, Okada, Kei
Large language models (LLMs) are now being used with increasing frequency as chat bots, tasked with the summarizing information or generating text and code in accordance with user instructions. The rapid increase in reasoning capabilities and inference speed of LLMs has revealed their remarkable potential for applications extending beyond the domain of chat bots to general machine learning tasks. This work is conducted out of the curiosity about such potential. In this work, we propose a framework Mockingbird to adapt LLMs to general machine learning tasks and evaluate its performance and scalability on several general machine learning tasks. The core concept of this framework is instructing LLMs to role-play functions and reflect on its mistakes to improve itself. Our evaluation and analysis result shows that LLM-driven machine learning methods, such as Mockingbird, can achieve acceptable results on common machine learning tasks; however, solely reflecting on its own currently cannot outperform the effect of domain-specific documents and feedback from human experts.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Analyzing sequential activity and travel decisions with interpretable deep inverse reinforcement learning
Liang, Yuebing, Wang, Shenhao, Yu, Jiangbo, Zhao, Zhan, Zhao, Jinhua, Pentland, Sandy
Travel demand modeling has shifted from aggregated trip-based models to behavior-oriented activity-based models because daily trips are essentially driven by human activities. To analyze the sequential activity-travel decisions, deep inverse reinforcement learning (DIRL) has proven effective in learning the decision mechanisms by approximating a reward function to represent preferences and a policy function to replicate observed behavior using deep neural networks (DNNs). However, most existing research has focused on using DIRL to enhance only prediction accuracy, with limited exploration into interpreting the underlying decision mechanisms guiding sequential decision-making. To address this gap, we introduce an interpretable DIRL framework for analyzing activity-travel decision processes, bridging the gap between data-driven machine learning and theory-driven behavioral models. Our proposed framework adapts an adversarial IRL approach to infer the reward and policy functions of activity-travel behavior. The policy function is interpreted through a surrogate interpretable model based on choice probabilities from the policy function, while the reward function is interpreted by deriving both short-term rewards and long-term returns for various activity-travel patterns. Our analysis of real-world travel survey data reveals promising results in two key areas: (i) behavioral pattern insights from the policy function, highlighting critical factors in decision-making and variations among socio-demographic groups, and (ii) behavioral preference insights from the reward function, indicating the utility individuals gain from specific activity sequences.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Singapore (0.04)
- North America > United States > Massachusetts (0.04)
- (4 more...)
- Transportation (1.00)
- Education (0.93)
- Consumer Products & Services > Travel (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.46)
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
Han, Han, Zhu, Tong, Zhang, Xiang, Wu, Mengsong, Xiong, Hao, Chen, Wenliang
Large language models (LLMs) combined with tool learning have gained impressive results in real-world applications. During tool learning, LLMs may call multiple tools in nested orders, where the latter tool call may take the former response as its input parameters. However, current research on the nested tool learning capabilities is still under-explored, since the existing benchmarks lack relevant data instances. To address this problem, we introduce NesTools to bridge the current gap in comprehensive nested tool learning evaluations. NesTools comprises a novel automatic data generation method to construct large-scale nested tool calls with different nesting structures. With manual review and refinement, the dataset is in high quality and closely aligned with real-world scenarios. Therefore, NesTools can serve as a new benchmark to evaluate the nested tool learning abilities of LLMs. We conduct extensive experiments on 22 LLMs, and provide in-depth analyses with NesTools, which shows that current LLMs still suffer from the complex nested tool learning task.
- North America > United States > California (0.05)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (5 more...)
Integrating Dynamic Correlation Shifts and Weighted Benchmarking in Extreme Value Analysis
Panagoulias, Dimitrios P., Sarmas, Elissaios, Marinakis, Vangelis, Virvou, Maria, Tsihrintzis, George A.
This paper presents an innovative approach to Extreme Value Analysis (EVA) by introducing the Extreme Value Dynamic Benchmarking Method (EVDBM). EVDBM integrates extreme value theory to detect extreme events and is coupled with the novel Dynamic Identification of Significant Correlation (DISC)-Thresholding algorithm, which enhances the analysis of key variables under extreme conditions. By integrating return values predicted through EVA into the benchmarking scores, we are able to transform these scores to reflect anticipated conditions more accurately. This provides a more precise picture of how each case is projected to unfold under extreme conditions. As a result, the adjusted scores offer a forward-looking perspective, highlighting potential vulnerabilities and resilience factors for each case in a way that static historical data alone cannot capture. By incorporating both historical and probabilistic elements, the EVDBM algorithm provides a comprehensive benchmarking framework that is adaptable to a range of scenarios and contexts. The methodology is applied to real PV data, revealing critical low - production scenarios and significant correlations between variables, which aid in risk management, infrastructure design, and long-term planning, while also allowing for the comparison of different production plants. The flexibility of EVDBM suggests its potential for broader applications in other sectors where decision-making sensitivity is crucial, offering valuable insights to improve outcomes.
- Europe > Greece (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- Europe > Portugal (0.04)
- (2 more...)
- Health & Medicine (1.00)
- Energy > Renewable > Solar (1.00)
- Banking & Finance (1.00)
- Government (0.68)
ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis
Wang, Zezhong, Zeng, Xingshan, Liu, Weiwen, Li, Liangyou, Wang, Yasheng, Shang, Lifeng, Jiang, Xin, Liu, Qun, Wong, Kam-Fai
Supervised fine-tuning (SFT) is a common method to enhance the tool calling capabilities of Large Language Models (LLMs), with the training data often being synthesized. The current data synthesis process generally involves sampling a set of tools, formulating a requirement based on these tools, and generating the call statements. However, tools sampled randomly lack relevance, making them difficult to combine and thus reducing the diversity of the data. Additionally, current work overlooks the coherence between turns of dialogues, leading to a gap between the synthesized data and real-world scenarios. To address these issues, we propose a Graph-based Sampling strategy to sample more relevant tool combinations, and a Planned-generation strategy to create plans that guide the synthesis of coherent dialogues. We integrate these two strategies and enable multiple agents to synthesize the dialogue data interactively, resulting in our tool-calling data synthesis pipeline ToolFlow. Data quality assessments demonstrate improvements in the naturalness and coherence of our synthesized dialogues. Finally, we apply SFT on LLaMA-3.1-8B using 8,000 synthetic dialogues generated with ToolFlow. Results show that the model achieves tool-calling performance comparable to or even surpassing GPT-4, while maintaining strong general capabilities.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.05)
- (4 more...)
Invisible Servoing: a Visual Servoing Approach with Return-Conditioned Latent Diffusion
Gerges, Bishoy, Bazzana, Barbara, Botteghi, Nicolò, Aboudorra, Youssef, Franchi, Antonio
In this paper, we present a novel visual servoing (VS) approach based on latent Denoising Diffusion Probabilistic Models (DDPMs). Opposite to classical VS methods, the proposed approach allows reaching the desired target view, even when the target is initially not visible. This is possible thanks to the learning of a latent representation that the DDPM uses for planning and a dataset of trajectories encompassing target-invisible initial views. The latent representation is learned using a Cross-Modal Variational Autoencoder, and used to estimate the return for conditioning the trajectory generation of the DDPM. Given the current image, the DDPM generates trajectories in the latent space driving the robotic platform to the desired visual target. The approach is applicable to any velocity-based controlled platform. We test our method with simulated and real-world experiments using generic multi-rotor Uncrewed Aerial Vehicles (UAVs). A video of our experiments can be found at https://youtu.be/yu-aTxqceOA.
- Europe > Netherlands (0.04)
- Europe > Italy > Lazio > Rome (0.04)
- Asia > Middle East > Jordan (0.04)
A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok
West, Jack, Thiemt, Lea, Ahmed, Shimaa, Bartig, Maggie, Fawaz, Kassem, Banerjee, Suman
Mobile apps have embraced user privacy by moving their data processing to the user's smartphone. Advanced machine learning (ML) models, such as vision models, can now locally analyze user images to extract insights that drive several functionalities. Capitalizing on this new processing model of locally analyzing user images, we analyze two popular social media apps, TikTok and Instagram, to reveal (1) what insights vision models in both apps infer about users from their image and video data and (2) whether these models exhibit performance disparities with respect to demographics. As vision models provide signals for sensitive technologies like age verification and facial recognition, understanding potential biases in these models is crucial for ensuring that users receive equitable and accurate services. We develop a novel method for capturing and evaluating ML tasks in mobile apps, overcoming challenges like code obfuscation, native code execution, and scalability. Our method comprises ML task detection, ML pipeline reconstruction, and ML performance assessment, specifically focusing on demographic disparities. We apply our methodology to TikTok and Instagram, revealing significant insights. For TikTok, we find issues in age and gender prediction accuracy, particularly for minors and Black individuals. In Instagram, our analysis uncovers demographic disparities in the extraction of over 500 visual concepts from images, with evidence of spurious correlations between demographic features and certain concepts.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (5 more...)
Automatically Testing Functional Properties of Code Translation Models
Eniser, Hasan Ferit, Wüstholz, Valentin, Christakis, Maria
Large language models are becoming increasingly practical for translating code across programming languages, a process known as $transpiling$. Even though automated transpilation significantly boosts developer productivity, a key concern is whether the generated code is correct. Existing work initially used manually crafted test suites to test the translations of a small corpus of programs; these test suites were later automated. In contrast, we devise the first approach for automated, functional, property-based testing of code translation models. Our general, user-provided specifications about the transpiled code capture a range of properties, from purely syntactic to purely semantic ones. As shown by our experiments, this approach is very effective in detecting property violations in popular code translation models, and therefore, in evaluating model quality with respect to given properties. We also go a step further and explore the usage scenario where a user simply aims to obtain a correct translation of some code with respect to certain properties without necessarily being concerned about the overall quality of the model. To this purpose, we develop the first property-guided search procedure for code translation models, where a model is repeatedly queried with slightly different parameters to produce alternative and potentially more correct translations. Our results show that this search procedure helps to obtain significantly better code translations.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Software > Programming Languages (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)